Locating Text in Historical Collection Manuscripts

نویسندگان

  • Basilios Gatos
  • Ioannis Pratikakis
  • Stavros J. Perantonis
چکیده

It is common that documents belonging to historical collections are poorly preserved and are prone to degradation processes. The aim of this work is to leverage state-of-the-art techniques in digital image binarization and text identification for digitized documents allowing further content exploitation in an efficient way. A novel methodology is proposed that leads to preservation of meaningful textual information in low quality historical documents. The method has been developed in the framework of the Hellenic GSRT-funded R&D project, D-SCRIBE, which aims at developing an integrated system for digitization and processing of old Hellenic manuscripts. After testing of the proposed method on numerous low quality historical manuscripts, it has turned out that our methodology performs better compared to current state-of-the-art adaptive thresholding techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Margins are more important than text, Historical values of margins, memorial notes and colophons of Manuscripts in Zoroastrian tradition

In the Zoroastrian tradition, the most important challenge and the most ambiguous issue is ambiguity in history and neglect of time and chronology. Perhaps, this approach that historical time is limit and the begging and end of time is clear and the goodness will be conqueror eventually; it is because of ambiguity of history in Zoroastrian tradition.since early time to now, the Zoroastrian re...

متن کامل

Within-text and Out-of-text Structures of Islamic-Iranian Manuscripts

Despite some differences, Islamic-Iranianmanuscriptshave special common out-of-text and within-text structures. These structures were followed by authors, scribes and transcription centers during centuries when transcription tradition was dominant throughout the Islamic world. In this article, these common features were considered in detail. Some manuscript folios preserved in National Library ...

متن کامل

Guideline: Multiple Hierarchies

As the title of the Dagstuhl Seminar Digital Historical Corpora Architecture, Annotation, and Retrieval already suggests, corpus architecture and corpus annotation is an important topic for representing (historical) texts. Especially the limitation of SGML-based markup languages to tree structured annotations raises a special problems when dealing with manuscripts: How is it possible to represe...

متن کامل

Iranian Scholars’ Revision of Their Submitted Manuscripts: Signaling Impersonality in Text

Nonnative English-speaking scholars have often been reported to be at a disadvantage vis-à-vis their English native counterparts when it comes to writing a publishable research article (RA). When they submit their manuscripts to English-language journals, they sometimes receive comments criticizing their faulty English. One area of difficulty for these authors is the grammaticalization of neutr...

متن کامل

Corrigendum: Virtual unrolling and deciphering of Herculaneum papyri by X-ray phase-contrast tomography

A collection of more than 1800 carbonized papyri, discovered in the Roman 'Villa dei Papiri' at Herculaneum is the unique classical library survived from antiquity. These papyri were charred during 79 A.D. Vesuvius eruption, a circumstance which providentially preserved them until now. This magnificent collection contains an impressive amount of treatises by Greek philosophers and, especially, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004